* Cleaning of Indonesian micro data from Survei Industri

* cleaning steps:
	* (1) Drop if data on materials or sales is missing (needed to establish whether importer / exporter)
	* (2) Drop firms with unrealistic year-to-year growth in sales or inputs

use "temp/SI.dta", clear
set more off
order id year, first

* Drop if data to establish whether importer / exporter is missing
drop if materials==.
drop if sales==.

* Eliminate firms with non-consecutive runs of data
sort id year
egen max_gap = max(year - year[_n-1]), by(id)
replace max_gap=1 if max_gap==.
drop if max_gap > 1 
drop max_gap

* Drop outliers
sort year id
bys year: egen salesgrowth99 = pctile(salesgrowth), p(99)
gen I99=0
replace I99=1 if salesgrowth>salesgrowth99 & Entry==0
bys year: egen salesgrowth01 = pctile(salesgrowth), p(1)
gen I01=0
replace I01=1 if salesgrowth<salesgrowth01 

bys year: egen materialsgrowth01 = pctile(materialsgrowth), p(1)
gen I01m=0
replace I01m=1 if materialsgrowth<materialsgrowth01
bys year: egen materialsgrowth99 = pctile(materialsgrowth), p(99)
gen I99m=0
replace I99m=1 if materialsgrowth>materialsgrowth99 & Entry==0

bys id: egen totalI99 = total(I99)
bys id: egen totalI99m = total(I99m)
bys id: egen totalI01 = total(I01)
bys id: egen totalI01m = total(I01m)
keep if totalI99==0
keep if totalI99m==0
keep if totalI01==0
keep if totalI01m==0
drop  I99 I99m I01 I01m totalI99 totalI99m totalI01 totalI01m

* Drop firms with export shares greater than 100
drop if exportshare>100 & exportshare!=.

* drop duplicates for 2001
duplicates tag id year, gen(isdup)
drop if isdup==1
do "dofiles/constructvariables2_indo.do"
save "temp/SI_clean.dta", replace
